-
Notifications
You must be signed in to change notification settings - Fork 8
epi_df
argument refactoring
#460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Nice lists; some notes/opinions:
|
time_value = time, datetime |
So, if both
oh yeah that makes sense thanks!
I guess I should just add whichever of these are supported by guess_time_type? The multi-column case does make sense, but I think I'm going to put that as out of scope for now and leave it as a future enhancement. |
In that instance, yes, although I'd also be fine with guessing it to be the But suppose there is only a
Sorry, this was mostly me reasoning about why it's good to exclude these. You could maybe accept |
On |
Looking into actually implementing the I think this is ready if someone wants to do a review. @lcbrooks I think most of the concerning cases you're thinking about will be taken care of by either multiple names triggering, and thus forcing an error. Seems like there are legit use cases for |
But even if we have chopped off target date, we would not want forecast date to be the time value. We would want to first reattach the target date then convert.. if we were ever to put these in an epi df at all, it seems a bit of a mismatch vs a dedicated predictions format or archive. I think forecast date should just be excluded from the considered set. |
If someone were trying to say, smooth the scores, I could see using |
At the same time, we've had bugs from lining up forecast dates of forecasts with time values of signals, and making the default make this easier seems undesirable. |
This is more into personal preferences, but I'd also actually probably prefer the prescriptive approach for data structures [or just column names, but we force people to use time_value regardless of what it actually represents which is the opposite of what I'm imagining] and a optionally nonprescriptive interface for functionality. E.g. an option or function to slide by forecast date rather than time value. |
In the interest of shipping things rather than leaving every PR open, I've dropped it as a default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brief review, will be more thorough in second pass. Please add some tests for the new functions.
f9499e6
to
243c20e
Compare
@nmdefries ok, it's actually fixed. It was all inane docs stuff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. My two big comments are
- Add classes to errors and warnings for easier and more robust testing. This could be turned into an issue and done later.
- Reshuffle logic in
guess_column_name
to read more clearly.
Other comments are nits/come down to your judgement.
R/utils.R
Outdated
if (!(column_name %in% names(x))) { | ||
x <- tryCatch(x %>% rename(any_of(substitutions)), | ||
error = function(cond) { | ||
cli_abort("{names(x)[names(x) %in% substitutions]} are both/all valid substitutions. | ||
Either `rename` some yourself or drop some.") | ||
} | ||
) | ||
# if none of the names are in substitutions, and `column_name` isn't a column, we're missing a relevant column | ||
if (!any(names(x) %in% substitutions)) { | ||
cli_abort("There is no {column_name} column or similar name. See e.g. [`time_column_name()`] for a complete list") | ||
} | ||
if (any(substitutions != "")) { | ||
cli_inform("inferring {column_name} column.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I find the logic in this chunk confusing. Here's an alternative flow that reads more clearly to me.
Move the "if none of the names are in substitutions..." chunk before the tryCatch
. Having the condition !any(names(x) %in% substitutions)
after the tryCatch
makes it read as if this block will always trigger. Downside to moving this before the tryCatch
is that it is no longer checking if the rename
was successful (although we shouldn't really need that).
If the rename
has an error, the rest of the tryCatch
expression won't be run, so I moved the cli_inform
there.
if (!(column_name %in% names(x))) { | |
x <- tryCatch(x %>% rename(any_of(substitutions)), | |
error = function(cond) { | |
cli_abort("{names(x)[names(x) %in% substitutions]} are both/all valid substitutions. | |
Either `rename` some yourself or drop some.") | |
} | |
) | |
# if none of the names are in substitutions, and `column_name` isn't a column, we're missing a relevant column | |
if (!any(names(x) %in% substitutions)) { | |
cli_abort("There is no {column_name} column or similar name. See e.g. [`time_column_name()`] for a complete list") | |
} | |
if (any(substitutions != "")) { | |
cli_inform("inferring {column_name} column.") | |
if (!(column_name %in% names(x))) { | |
# if none of the names are in substitutions, and `column_name` isn't a column, we're missing a relevant column | |
if (!any(names(x) %in% substitutions)) { | |
cli_abort("There is no {column_name} column or similar name. See e.g. [`time_column_name()`] for a complete list") | |
} | |
x <- tryCatch({ | |
tmp <- x %>% rename(any_of(substitutions)) | |
cli_inform("inferring {column_name} column.") | |
tmp | |
}, | |
error = function(cond) { | |
cli_abort("{names(x)[names(x) %in% substitutions]} are both/all valid substitutions. | |
Either `rename` some yourself or drop some.") | |
} | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the tmp? actually read it XD
Checklist
Please:
PR).
brookslogan, nmdefries.
DESCRIPTION
. Always incrementthe patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
process.
Change explanations for reviewer
There's a couple of things we want to change about the arguments to
epi_df
and/oras_epi_df
:time_value
but are unambiguously meant to be that should be interpreted as such, e.g.date
.geo_value
but are unambiguously meant to be that should be interpreted as such, e.g.geo_id
.forecast_date
andtarget_date
you could tell ittime_value = target_date
to disambiguate)other_keys=
, by far the most useful part of thatepi_archive
s. Some have already happened (e.g.other_keys
is already separated)Current list of
time_value
equivalents:Current list of
geo_value
equivalents:Current list of
version
equivalents:Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch
other_keys
to be printed, a constructor parameter, and more clearly documented #186, though some features there are out of scope; going to make separate issues forprint.epi_df
should print the other_keys metadatakeys
documentationother_keys
#446as_epi_df
construction conviences #456Future extensions
time_value = join_by(month, year)
where month and year are separate columns